Exploring the Markov Chain Monte Carlo method. There isn’t any dataset in today’s tutorial, only simulated examples.
Estimate the P(Head) of a biased coin. In particular, estimate the posterior probability P(Head|Observed Data). Start by simulating some coin flips, where we know \(p=\)P(Head). Then, use MCMC to see if \(p\) can be estimated from the data of the coin flips.
Below is a way to simulate coin flips with chance of success \(\theta = p = 0.7\) but you can use other simulated data if you wish. Denote these simulated values as the random variables \(X_1, X_2, \ldots, X_n\) where \(n = 200\)
set.seed(5003)
theta = 0.7 # P(Head)
N = 200
# simulate some coin flips
coin.flips <- rbinom(n = N, size = 1, prob = theta)
Recall the Likelihood approach covered in a earlier module. In this case, each observation is assumed to be drawn from a Bernoulli (Binomial with \(n = 1\)) distribution. The formula is,
\[ P(X = x|\theta) = \begin{cases} \theta^x(1 - \theta)^{(1 - x)}, & \text{ if } x = 0 \text{ or }1;\\ 0, & \text{otherwise}.\end{cases} \]
and it can be computed with dbinom. Write a function
that computes the Likelihood of the parameter \(\theta\) given all of the observed
data.
\[ L(\theta|X_1, X_2,\ldots, X_n) = \prod_{i = 1}^n P(X = x_i|\theta) \]
likelihood <- function(theta, x)
prod(dbinom(x, size = 1, prob = theta))
Write a Metropolis Hastings algorithm that takes a specified initial value for the parameter \(\theta\) (or you can make it random if you wish) and then proposes a new value with a random walk sampler. For simplicity it is recommended to use the Gaussian density as the proposal density \(q\). Psuedo-code of the Metropolis Hasting algorithm is given below. It needs some work that is obvious, other work required are the proposed theta needs to be between 0 and 1, it will need some code to ensure an impossible theta isn’t generated.
# In this example, we set the prior of P(Head) to be a uniform distribution
niters <- 1000
curr.theta <- 0.5 # This is the parameter we want to find; initialise it
theta.chain <- numeric(niters) # Initialise the vector to be created
for(i in 0:(niters - 1)) {
proposal.theta <- ... # A random draw from the proposal density that is conditioned on the previous iteration.
curr.likeli <- ... # computed likelihood from the current theta
proposal.likeli <- ... # computed likelihood from the proposed theta
alpha <- min(..., 1)
# Randomly decide whether to accept our new proposal
if (runif(1) < alpha)
curr.theta <- ...
theta.chain[i + 1] <- ...
}
niters <- 1000
curr.theta <- 0.5 # This is the parameter we want to find; initialise it
theta.chain <- numeric(niters) # Initialise the vector to be created
for(i in 0:(niters - 1)) {
proposal.theta <- curr.theta + rnorm(1, mean = 0, sd = 0.05) # A random draw from the proposal density that is conditioned on the previous iteration.
if (proposal.theta < 0 || proposal.theta > 1)
{
theta.chain[i + 1] <- curr.theta
next
}
curr.likeli <- likelihood(curr.theta, coin.flips) # computed likelihood from the current theta
proposal.likeli <- likelihood(proposal.theta, coin.flips) # computed likelihood from the proposed theta
alpha <- min(proposal.likeli/curr.likeli, 1)
# Randomly decide whether to accept our new proposal
if (runif(1) < alpha)
curr.theta <- proposal.theta
theta.chain[i + 1] <- curr.theta
}
plot(theta.chain)
There was a typo in the original version of this that made the chain above not converge. It now works quite well and the remaining analysis of this question not required. However, if hypothetically, the computed likelihoods are too small for the machine to handle accurately (division by a small value can be unstable). Then, an alternative it to compute the product and division of the likelihood in the log domain (where it is a sum!)
The log likelihood can be computed to stabilize the computation above
where the computed likelihood values fall below the
.Machine$double.eps, roughly speaking the smallest possible
value that the computer can handle with precision. (see
? .Machine).
Write a function that computes the log-Likelihood of the parameter \(\theta\) given all of the observed data.
\[ \mathcal{L}(\theta|X_1, X_2,\ldots, X_n) = \log L(\theta|X_1, X_2,\ldots, X_n) = \log(\prod_{i = 1}^n P(X = x_i|\theta)) = \sum_{i = 1}^n \log P(X = x_i|\theta) \]
Hint: The argument of log is useful in
the dbinom function. (See ? dbinom).
log.likelihood <- function(theta, x)
sum(dbinom(x, size = 1, prob = theta, log = TRUE))
Amend the Metropolis-Hastings algorithm above to use the log likelihood instead. Both the likelihood computations will now need to compute the log likelihood and the acceptance probability \(\alpha\) will need to be modified.
niters <- 1000
curr.theta <- 0.5 # This is the parameter we want to find; initialise it
theta.chain <- numeric(niters) # Initialise the vector to be created
for(i in 0:(niters - 1)) {
proposal.theta <- ... # A random draw from the proposal density that is conditioned on the previous iteration.
curr.log.likeli <- ... # computed log likelihood from the current theta
proposal.log.likeli <- ... # computed log likelihood from the proposed theta
alpha <- min(..., 1) # This needs to be updated (hint, exp cancels out log)
# Randomly decide whether to accept our new proposal
if (runif(1) < alpha)
curr.theta <- ...
theta.chain[i + 1]
}
# In this example, we set the prior of P(Head) to be a uniform distribution
niters <- 1000
curr.theta <- 0.5 # This is the parameter we want to find; initialise it
theta.chain <- numeric(niters)
for(i in 1:niters) {
proposal.theta <- curr.theta + rnorm(1, mean = 0, sd = 0.05)
# If proposed theta leaves the feasible range of values
if(proposal.theta > 1 || proposal.theta < 0) {
theta.chain[i + 1] <- curr.theta
next
}
# It is easier to work with log of probabilities
curr.log.likeli <- log.likelihood(curr.theta, coin.flips)
proposal.log.likeli <- log.likelihood(proposal.theta, coin.flips)
# Because we have taken log(prob), now we need a slightly different
# formula to get the acceptance probability. Follows from exp(log(a/b)) = exp(log(a) - log(b))
alpha <- min(exp(proposal.log.likeli - curr.log.likeli), 1)
# Randomly decide whether to accept our new proposal
if (runif(1) < alpha)
curr.theta <- proposal.theta
theta.chain[i + 1] <- curr.theta
}
You can see from the plot that the parameter of theta converges to around 0.7. The first part of the plot is the burn-in. We should discard it and use the rest of the chain to estimate theta. This plot looks like a hairy caterpillar which means it has converged properly.
plot(theta.chain, type = "l")
hist(theta.chain[100:1000], n = 30)
# MCMC estimates of theta after 1000 iterations
print(mean(theta.chain[100:1000]))
## [1] 0.6506242
print(sd(theta.chain[100:1000]))
## [1] 0.03286099
plot(theta.chain[100:1000], type = 'l')
Use the MCMC procedure to generate random samples from the following probability distribution function. This function is simply the sum of two normal distributions. Explore the use of the proposal distribution to explore the target space in the MCMC algorithm.
p <- 0.4
mn <- c(-1, 2)
std <- c(0.2, 1.5)
f <- function(x, mu = mn, sd = std) p * dnorm(x, mu[1], sd[1]) + (1 - p) * dnorm(x, mu[2], sd[2])
curve(f(x), col = "red", from = -4, to = 8, n = 300, las = 1)
Use the Gaussian density centered on the current point as the
proposal transition. Consider the random walk Metropolis-Hastings
algorithm again (symmetric proposal distribution). Define a function
\(q\) in R to be a sample
from a Gaussian distribution but with two parameters, \(x\) to denote the current value of the
markov chain and \(\sigma\) to be the
sd of the Gaussian distribution.
q <- function(x, sigma) ...(1, mean = ..., sd = ...)
q <- function(x, sigma) rnorm(1, mean = x, sd = sigma)
Write a function that uses a random walk Metropolis-Hastings sampling
of the target distribution f that uses your proposal
function q but has argument, sigma to control
the sd parameter of the proposal distribution. i.e. your function should
have signature.
#' x_0 : Initial value of the Markov chain
#' n : number of iterations to compute
#' sigma : size of the sd in the proposal distribution
function(x_0, n, sigma)
# Function that takes a starting value and a number of iterations to compute
mc.algorithm <- function(x_0, n, sigma)
{
x <- numeric(n)
x[1] <- x_0
for (i in 1:(n - 1))
{
prop.x <- q(x[i], sigma = sigma)
alpha <- min(f(prop.x)/f(x[i]), 1)
if (runif(1) < alpha)
x[i + 1] <- prop.x
else
x[i + 1] <- x[i]
}
x
}
x_0 and sigmaSample the target distribution using the random walk
Metropolis-Hastings sampling with different choices of x_0
and sigma and view and interpret the output. Consider how
the different choices of the starting point x_0 and the
ability for the proposal to explore the space with different values of
sigma impact the results of the algorithm. You can choose
your own different parameters or use these ones provided
x_0.vals <- c(-1, 1, 5) and
sigma.vals <- c(0.01, 0.5, 20). To visualize your
results, consider plotting the probability histogram of the outputs and
drawing the true target density on top.
Hint: expand.grid might be a useful
function to look at many combinations of choices of x_0 and
sigma
set.seed(5003)
sigma <- c(0.01, 0.5, 20)
x_0 <- c(-1, 1, 5)
params <- expand.grid(x_0, sigma)
x <- seq(from = -4, to = 8, length.out = 1000)
invisible(lapply(1:nrow(params), function(i) {
x_0 <- params[i, 1]
sigma <- params[i, 2]
output <- mc.algorithm(x_0, n = 50000, sigma = sigma)
hist(output, probability = TRUE,
main = paste0("x_0 = ", x_0, "; sigma = ", sigma),
breaks = seq(from = -5, to = 10, by = 0.2))
lines(x, f(x), col = 'red')
}))
Looking at the outputs, we can see that sigma too small means the algorithm gets stuck and doesn’t explore the space sufficiently. Conversely sigma too large, meaning some proposals will jump beyond the reasonable possibilities of the target distribution and be rejected and the Markov chain wont move enough.